In this lesson, we will incorporate analysis of wind directions to help explain origins of air pollutants.

Load libraries; set preferences

library(tidyverse)
library(chron)
Sys.setlocale("LC_TIME","C")
options(stringsAsFactors=FALSE)
options(chron.year.abb=FALSE)
theme_set(theme_bw()) # just my preference for plots

Read in wind data

In previous modules, we covered temperature, precipitation, and radiation intensity. Here we will use wind speed and direction.

An example met file provided by NABEL looks like the following:

cat(encodeString(readLines("data/2013/LAU_Wind_MW1_13.txt", n=20)), sep="\n")
## Lausanne-C\xe9sar-Roux
## Messstation Nr.:   9
## MW1                vom 01.01.2013  00:10 Uhr
##                    bis 31.12.2013  24:00 Uhr
## Bezeichnung der Datenkolonnen:
## Kol Messobjekt  MSNR  Einheit  Position  Text
##   1 Jahr           -  -          1-  4   Jahr
##   2 Monat          -  -          6-  7   Monat
##   3 Tag            -  -          9- 10   Tag
##   4 Stunde         -  -         12- 13   Stunde
##   5 Minute         -  -         15- 16   Minute
##   6 WIRI           -  \xb0         18- 27   Windrichtung
##   7 WIGE           -  m/s       28- 37   Windgeschwindigkeit
## Ausgefallene Werte werden durch den Wert -9999 gekennzeichnet.
## X
## 2013 01 01 01 00 75.4308 0.2228
## 2013 01 01 02 00 98.4915 0.3154
## 2013 01 01 03 00 87.5222 0.4424
## 2013 01 01 04 00 21.2496 0.2589
## 2013 01 01 05 00 322.2944 0.3633

Columns 6 and 7 are wind direction (in degrees) and wind speed (m/s), respectively.

We will define a function for reading in meteorological files.

ReadMet <- function(filename) {
  data <- read.table(filename, skip=15, col.names=c("year", "month", "day", "hour", "minute", "WIRI", "WIGE"))
  data %>%
    mutate(datetime = as.chron(paste(year, month, day, hour, minute), "%Y %m %d %H %M"),
           year     = years(datetime),
           month    = months(datetime),
           day      = days(datetime),
           hour     = hours(datetime),
           minute   = minutes(datetime),
           WIRI     = ifelse(WIRI <= -9999, NA, WIRI),
           WIGE     = ifelse(WIGE <= -9999, NA, WIGE))
}

Read in met data:

datapath <- "data/2013"
met <- full_join(cbind(site="LAU", ReadMet(file.path(datapath, "LAU_Wind_MW1_13.txt"))),
                 cbind(site="ZUE", ReadMet(file.path(datapath, "ZUE_Wind_MW1_13.txt"))))

Merge with concentrations

Read in concentration data using functions defined in Lesson 4:

conc <- readRDS("data/2013/lau-zue.rds")

We will merge the data frames. In this case, the wind data is already averaged into hourly intervals so has the same time basis as the pollutant concentrations. For the data provided for your exercise, you will want to average the 5-minute time resolution to hourly resolution drawing upon methods shown in previous lessons (hint: use summarize).

For averaging wind direction, you need to convert angle values (polar coordinate representation) to Cartesian coordinates before averaging. Explanation - if you have two angles, 1\(^\circ\) and 359\(^\circ\), the arithmetic mean is 180\(^\circ\), rather than 0\(^\circ\). An algorithm for averaging in Cartesian coordinates can be expressed as follows:

\[\begin{align*} \bar{x} &= \langle \cos\theta \rangle\\ \bar{y} &= \langle \sin\theta \rangle\\ \bar{\theta} &= \operatorname{arctan2}(\bar{y}, \bar{x}) \end{align*}\]

The function below shows an implemention in R that can be used in place of the arithmetic mean (mean) when applying to wind directions:

mean.angle <- function(theta, r=1, ...) {
  ## Function for averaging angles
  ## Polar coordinates -> Cartesian coordinates -> polar coordinates
  ##   'theta' is in degrees
  ##   'r=1' for unit circle
  ##   returns value is mean theta in degrees
  theta.rad <- theta * pi/180
  x <- mean(r * cos(theta.rad), ...)
  y <- mean(r * sin(theta.rad), ...)
  theta.deg <- atan2(y, x) * 180/pi
  ifelse(sign(theta.deg) < 0, (theta.deg + 360) %% 360, theta.deg) # -179--180 to 0--359
}

Example application:

mean(c(359, 1)) # arithmetic mean
## [1] 180
mean.angle(c(359, 1)) # mean in Cartesian coordinates
## [1] 0

After obtaining hourly values, you can use the join operation. Here we use left_join to merge to periods for which we have air pollution data (in conc). Note that we also drop the datetime column from the met data frame so that we do not merge on numeric values (the numeric column named datetime also appears in conc), which is generally bad practice (it is better to merge based on character or factor columns).

df <- left_join(conc,
                met %>% select(-datetime) %>% mutate(year=as.character(year)))
                                        # conc[["year"]] is otherwise incompatible with met[["year"]]
tail(df)
##       site              datetime  O3  NO2  CO PM10 TEMP PREC  RAD year month
## 17563  ZUE (12/31/2013 18:00:00) 1.4 49.1 0.6 28.5  1.0    0 -2.7 2013   Dec
## 17564  ZUE (12/31/2013 19:00:00) 1.5 47.9 0.6 31.9  0.2    0 -2.4 2013   Dec
## 17565  ZUE (12/31/2013 20:00:00) 2.0 48.6 0.7 34.1 -0.1    0 -2.5 2013   Dec
## 17566  ZUE (12/31/2013 21:00:00) 2.5 48.4 0.7 40.7  0.0    0 -2.4 2013   Dec
## 17567  ZUE (12/31/2013 22:00:00) 2.2 43.0 0.6 48.5 -0.4    0 -2.5 2013   Dec
## 17568  ZUE (12/31/2013 23:00:00) 2.4 43.5 0.6 53.3 -0.7    0 -2.5 2013   Dec
##       day hour dayofwk daytype season SO2 NMVOC  EC minute     WIRI   WIGE
## 17563  31   18     Tue Weekday    DJF 3.4   0.2 2.4      0 339.7477 1.2211
## 17564  31   19     Tue Weekday    DJF 3.4   0.2 2.4      0 331.6524 0.6470
## 17565  31   20     Tue Weekday    DJF 3.3   0.2 2.6      0 339.0621 1.4807
## 17566  31   21     Tue Weekday    DJF 4.2   0.2 2.6      0 337.1875 0.7424
## 17567  31   22     Tue Weekday    DJF 3.6   0.2 2.5      0 344.4312 1.1424
## 17568  31   23     Tue Weekday    DJF 4.4   0.2 2.5      0 351.2632 1.0341

Let us look at a summary of hourly wind speeds by season (now that we have obtained the seasons from the concentration data frame df):

df %>%
  mutate(hour = factor(hour)) %>%
  ggplot+
  facet_grid(site~season)+
  geom_boxplot(aes(hour, WIGE))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

For summarizing wind directions, we will use wind roses. We will examine cases in which we want to visualize wind directions for specific days.

Visualizing wind roses (histograms)

We will use this function originally modified from Andy Clifton for plotting wind roses.

source("WindRose.R")

Wind roses are effectively histograms in polar coordinates. There are different conventions for ordering frequency by wind speed. Two are shown below. First, low wind speeds in the interior:

df %>%
  filter(site=="ZUE") %>%
  plotWindrose(spd = "WIGE", dir = "WIRI", decreasing=FALSE)

                                        # (decreasing=FALSE is the default in our code)

High wind speeds in the interior:

df %>%
  filter(site=="ZUE") %>%
  plotWindrose(spd = "WIGE", dir = "WIRI", decreasing=TRUE)

These plots can give different visual impressions. We will adopt the first convention for the remaining examples.

Using syntax shown previously, we can group summaries by site and season:

plotWindrose(df, spd = "WIGE", dir = "WIRI") +
  facet_grid(site~season)

Visualizing wind direction in time

Let us select the last week in July.

example.dates <- dates("08/01/2013") + seq(-6, 0)
example.df <- df %>% filter(dates(datetime) %in% example.dates)
str(example.df)
## 'data.frame':    336 obs. of  22 variables:
##  $ site    : chr  "LAU" "LAU" "LAU" "LAU" ...
##  $ datetime: 'chron' num  (07/26/2013 00:00:00) (07/26/2013 01:00:00) (07/26/2013 02:00:00) (07/26/2013 03:00:00) (07/26/2013 04:00:00) ...
##   ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ O3      : num  44.8 33.6 48.8 48.3 55.5 57.3 21 40.4 53.8 58.2 ...
##  $ NO2     : num  37.2 42.5 31.9 31.4 30.7 30.4 70 50.4 51.9 59.2 ...
##  $ CO      : num  0.4 0.3 0.3 0.3 0.3 0.3 0.5 0.5 0.4 0.4 ...
##  $ PM10    : num  19.1 17.3 17.3 15.8 19.2 16.2 19.4 23.6 24 24.2 ...
##  $ TEMP    : num  23.6 23.3 22.7 22.3 22.1 22 22.2 22.8 23.5 25 ...
##  $ PREC    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ RAD     : num  -2.9 -2.9 -2.9 -2.8 -2.7 ...
##  $ year    : chr  "2013" "2013" "2013" "2013" ...
##  $ month   : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ day     : Ord.factor w/ 31 levels "1"<"2"<"3"<"4"<..: 26 26 26 26 26 26 26 26 26 26 ...
##  $ hour    : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ dayofwk : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ daytype : chr  "Weekday" "Weekday" "Weekday" "Weekday" ...
##  $ season  : Factor w/ 4 levels "DJF","MAM","JJA",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SO2     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ NMVOC   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ EC      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ minute  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ WIRI    : num  40 30.3 77.7 58.4 52.9 ...
##  $ WIGE    : num  0.192 0.169 0.185 0.262 0.218 ...

If we plot time series:

example.df %>%
  gather(variable, value, c(WIRI, WIGE)) %>%
  ggplot+
  facet_grid(variable~site, scale="free_y")+
  geom_line(aes(datetime, value))+
  scale_x_chron(name="date")+
  theme(axis.text.x = element_text(angle = 45, hjust=1))

We find that the wind speed makes sense but wind directions appear erratic because the wind direction is a periodic function.

Let us consider a better way to represent wind directions. We will map directions to radial coordinates (hue) in the HSV color wheel, which also uses a color scheme that is periodic. To understand this color wheel, let us generally speak about color models which describe an abstraction of colors in mathematical coordinate systems. Most people are familiar with the RGB color cube, described in Cartesian coordinates, shown on the left below. The HSV color wheel describes colors in a cylindrical coordinate system, and is shown on the right.

https://upload.wikimedia.org/wikipedia/commons/8/83/RGB_Cube_Show_lowgamma_cutout_b.png https://upload.wikimedia.org/wikipedia/commons/0/0d/HSV_color_solid_cylinder_alpha_lowgamma.png
RGB (left) and HSV (right) color coordinates (images from Wikipedia).


Examining the HSV color wheel, we can see that it may be appropriate to describe the wind direction, which is periodic, using the hue (H), and allowing the saturation (S) or value (V) to represent wind speed. However, as these subtle color differences may be challenging to differentiate, we will only map H to the wind direction; setting S=1 and V=1.

Colors generated using hsv() function in R.

color <- hsv(seq(0, 360) / 360, 1, 1)
angle <- seq(0, 360, 60)

Let us revisit our earlier example:

ggplot(example.df) +
  facet_grid(site~.)+
  geom_line(aes(datetime, WIGE), color="gray")+
  geom_point(aes(datetime, WIGE, color=WIRI))+
  scale_color_gradientn(colors = color, breaks=angle, limits=range(angle), expand=c(0, 0))+
  scale_x_chron(name="date", limits=range(example.dates))

Examples

We will revisit some anomalous periods identified in the previous module.

We will use the date variable often, so we save the variable to a different column:

df[["date"]] <- dates(df[["datetime"]])

Example 1

Select dates in which daily mean PM10 exceeded 50 \(\mu\)g/m\(^3\) threshold:

PercentRecovery <- function(x)
  length(na.omit(x)) / length(x) * 1e2

daily.zue <- df %>%
  filter(site=="ZUE") %>%                               # only Zurich
  group_by(site, date) %>%                              # conceptually "split" the data by site and date
  summarize(percent.recovery = PercentRecovery(PM10),   # calculate percent recovery
            PM10.mean = mean(PM10, na.rm=TRUE)) %>%     # calculate daily mean
  ungroup                                               # undo the group_by for later operations

hd <- left_join(
  daily.zue %>%
  filter(percent.recovery >= 75 & PM10.mean > 50) %>%
  select(site, date),
  df
)

Define plots:

## concentrations
ggp1 <- hd %>%
  mutate(date = format(date, "y.m.d")) %>%
  ggplot+
  geom_line(aes(hour, PM10))+
  geom_point(aes(hour, PM10))+
  facet_grid(.~date)

## wind speed and direction
ggp2 <- hd %>%
  mutate(date = format(date, "y.m.d")) %>%
  ggplot+
  geom_line(aes(hour, WIGE), color="gray")+
  geom_point(aes(hour, WIGE, color=WIRI))+
  scale_color_gradientn(colors = color, breaks=angle, limits=range(angle), expand=c(0, 0))+
  facet_grid(.~date)
  ##guides(color = "none")

Display plots:

print(ggp1)

print(ggp2)

Let us summarize wind speed and directions between periods in which the 50 \(\mu\)g/m\(^3\) threshold was both exceeded and not exceeded for contrast. The difference with the join operation above is that we keep all the rows with the following operation (as we do not apply a filter to select only exceeded rows):

wd <- left_join(
  daily.zue %>%
  mutate(category = ifelse(PM10.mean > 50, "exceeded", "not exceeded")) %>%
  select(site, date, category),
  df
)

We see that these events only occur during the winter and spring seasons ("DJF" and "MAM"):

wd %>%
  filter(category=="exceeded") %>% # select only exceeded dates
  distinct(season)                 # only unique values
## # A tibble: 2 x 1
##   season
##   <fct> 
## 1 DJF   
## 2 MAM

In addition, the wind consistently came from the North during these periods.

wd %>%
  filter(season %in% c("DJF", "MAM")) %>%
  plotWindrose(spd = "WIGE", dir = "WIRI") +
  facet_grid(.~category)

To see the left figure more clearly, we can filter only the "exceeded" values.

wd %>%
  filter(season %in% c("DJF", "MAM") & category=="exceeded") %>%
  plotWindrose(spd = "WIGE", dir = "WIRI") +
  ggtitle("exceeded")

Example 2

Select dates in which hourly PM10 exceeded 100 \(\mu\)g/m\(^3\). The left_join operation effectively selects rows in data frame df with site and date variables in which the values are exceeded:

exceeded.zue.100 <- df %>%
  filter(site=="ZUE") %>%                      # only Zurich
  group_by(site, date) %>%                     # conceptually "split" the data by site and date
  summarize(is.exceeded = any(PM10 > 100)) %>% # determine whether any hourly value exceeds 100 for each day
  ungroup

hd <- left_join(
  exceeded.zue.100 %>% filter(is.exceeded),    # remove days (rows) which do not have exceeding periods
  df
)

Define plots:

## concentrations
ggp1 <- hd %>%
  mutate(datelabel = format(date, "y.m.d")) %>%
  ggplot+
  geom_line(aes(hour, PM10))+
  facet_grid(.~datelabel)

## wind speed and direction
ggp2 <- hd %>%
  mutate(datelabel = format(date, "y.m.d")) %>%
  ggplot+
  geom_line(aes(hour, WIGE), color="gray")+
  geom_point(aes(hour, WIGE, color=WIRI))+
  scale_color_gradientn(colors = color, breaks=angle, limits=range(angle), expand=c(0, 0))+
  facet_grid(.~datelabel)

Display plots:

print(ggp1)

print(ggp2)

The events are likely localized in space, and concentrations decrease as the wind speeds pick up in the early mornings (also with new boundary layer forming in the morning hours).